Gene Expression Analysis with No Sequence Data: Study on Reeves’s Muntjac (Muntiacus reevesi)

Despite scientific progress, the gene sequences for many species not commonly used in research have not yet been analyzed. This makes it difficult to carry out molecular studies on such animals, as the sequence of genes is the basic information used in many techniques. In this study, we attempt to design primers for a real-time PCR analysis, basing on a comparative analysis of selected gene sequences of species related to Reeves’s muntjac (Muntiacus reevesi) and by identifying highly conservative regions. Results of PCR products sequencing and their alignment with the GenBank collection show that all selected primers gave products highly similar (> 90%) to the intended target (among compared species), which led us to the conclusion that our primers may be used for further analyses of gene expression.


Introduction
Real-time PCR is a basic method for a gene expression analysis and is widely used by scientists all over the world [1]. The technique uses DNA polymerase to amplify a specific region of a target nucleic acid and requires specific primers-oligonucleotides that flank the region of interest. Primers are designed on the basis of sequences available in GenBank-a collection of publicly available DNA sequences. The GenBank database is constantly updated; however, its collection is still incomplete. The genomes of relatively few animal species have been fully sequenced and many available sequences, especially for species not commonly used in research, are partial or putative. Thus, the aim of this study is to develop a method of designing primers for a selected gene expression analysis, using real-time PCR, when there is no sequence data available in GenBank for the particular animal. The specific aim of this study is to design and validate primers that will be used for determining the mRNA expression of eleven genes in the tissue samples collected from the gastrointestinal tract (GIT) of Reeves's muntjac (Muntiacus reevesi).

Materials and Methods
Tissue samples collected from male Reeves's muntjacs during the realization of a larger research project were used for the purposes of this study (Local Institutional Animal Care and Use Committee, Krakow, Poland, resolution no. 150/2018). In brief, the animals were allocated into a nutritional study and throughout the study period were kept and fed individually one of three experimental diets. The diets consisted of dehydrated chopped lucerne (ad libitum), high-fiber pellets (100 g/day) and wheat bran (30 g/day) without (group 0) or with addition of 10 or 20 g of glucose, fructose and sucrose mixture/day (group 10 and group 20, respectively). The GIT samples were collected after 14 days of adaptation to the experimental diets, from four regions of the GIT: atrium ruminis and ventral sac of the rumen (rumen papillae), duodenum and small intestine (whole thickness tissue samples). Tissue samples served for mRNA isolation followed by reverse transcription as described before [2]. The cDNA from each analyzed part of GIT, obtained from four randomly selected animals, was mixed together to create a pooled sample, which served for further analyses.
No sequence for aforementioned genes was available for Reeves's muntjacs in Gen-Bank. Additionally, so far, mRNA expression of aforementioned genes has not been investigated in Reeves's muntjacs; hence, no literature data were available in this matter. Keeping in mind that gene structure is similar between related species [3], we decided to bring together all sequences for selected genes of species related to Reeves's muntjacs (suborder Ruminantia; infraorder Pecora; families Cervidae and Bovidae; species listed in Supplementary Material, Table S1) which are available in GenBank. For selected genes, we compared all available mRNA sequences (i.e., partial sequences, predicted sequences, transcript variants). Subsequently, we aligned all sequences and identified highly conservative regions (BLAST [4]). For these regions, we designed specific primers using Primer Blast (Table 1). Primers were designed in such a way that they matched the most conservative sites, avoiding, if possible, even single nucleotide polymorphism (especially within the primer binding sites) and giving the longest possible product (max amplicon length < 250 bp, see Discussion), so that its identification would give the most reliable results. For each gene, at least two primer pairs were designed. These primers served for the real-time PCR analysis where selected genes mRNA expression was determined using StepOnePlus™ Real-Time PCR System (Life Technologies Corporation, Carlsbad, California, USA). Reaction mixture (10 µL of total volume) contained PowerUp™ SYBR ® Green Master Mix (Thermo Fisher Scientific, Waltham, MA, USA), 900 nM of each primer (forward or reverse), 1 µL of cDNA sample (0.87 µg) and nuclease-free water (Thermo Fisher Scientific, Waltham, MA, USA). Based on the results of analysis (shape of the amplification curve, purity (based on the melting curve and agarose gel electrophoresis) and quantity (based on the Ct value) of the product) [5], the best primer pair for each gene was chosen. Product of each reaction of the chosen primer pair was sequenced, both forward and reverse. Sequencing was carried out using the Sanger method and BigDye ® Terminator v3.1 kit (Life Technologies, Carlsbad, California, USA) in a commercial laboratory (Genomed, Warsaw, Poland). The products of sequencing reactions were separated on a 3730 × 1 DNA Analyzer capillary sequencer (Applied Biosystems, Life Technologies, Carlsbad, California, USA). Chromograms were read in Chromas (version 2.6.6; Technelysium, South Brisbane, Australia). Obtained sequences ( Table 2) were used to search GenBank collection for highly similar sequences and, during this alignment, both query coverage and percent identity were considered (Table 3). Subsequently, the efficiency of real-time PCR reaction for each primer pair was checked by creating a relative standard curve using 5-fold serial dilutions of the pooled sample. The total of 4 dilutions was prepared (0.2; 0.04; 0.008; 0.0016) and, together with undiluted sample, they served for making the five-point relative standard curve.

Results
All selected primers gave products highly similar to the intended targets ( Table 3). The query coverage for analyzed PCR products with target gene sequences of compared species was high for the majority of the primers (≥ 90%). Lower values (from 88% to 90%) were observed for both, forward and reverse, products of sequencing for MCT1 and for the reverse product for GPR41. Considering the percent identity, all of the investigated products had a high sequence homology (> 90%) to the intended target, among compared species. The efficiency of reaction for selected primers ranged from 95.13% to 104.46% (mean 100.33% ± 3.00%; Table 1).

Discussion
A comparative analysis of sequences gives opportunity to decode information contained in DNA of newly sequenced species, to identify coding or noncoding regions and to find regulatory elements [3,6]. However, it usually covers the analysis and processing of already existing sequences. In our study, we tried the opposite approach and attempted to predict how the particular gene sequence might look in the species that have not yet been sequenced.
During the primer designing process, one of the goals was to design primers for the longest possible fragment of coding sequence that was still conservative between species, because the longer the amplicon, the more accurate its identification can be during GenBank collection search. On the other hand, it is generally recommended that the amplicon length for a real-time PCR analysis should be in the range of 50-150 bp or even up to 200 bp [5,[7][8][9]. However, research showed that longer amplicons (>200 bp <~250 bp) can still be amplified with a similar efficiency [10,11]. Hence, the maximal length for the designed amplicon in the present study was set at~250 bp.
Taking into account the above, the longest amplicon we obtained was for ACTB (248 bp). As mentioned before, the length of the amplicon may differentially affect the efficiency of the reaction [5,10]. However, in the case of the present study, the efficiency for all analyzed primer sets was very high (in the range of 95-105% [5,12]) and in the case of ACTB equaled 103,2%. It was also found that the chemistry used for DNA detection may have an impact on the efficiency of the reaction, with SYBR ® Green being less sensitive to the amplicon size than the double-dye probes such as TaqMan ® probes [10]. Thus, it is possible that we did not observe loss in reaction efficiency due to the chemistry used.
As shown above, based on the GenBank mRNA sequence database search, all obtained amplicons had a high sequence similarity with the target gene sequences in species related to Reeves's muntjacs. High sequence identity is considered a good indicator of function similarity [13]. Thus, it led us to the conclusion that the presented method of designing primers seems to be reliable and gives the opportunity to study the expression of genes in less studied animal species.
Supplementary Materials: The following are available online at https://www.mdpi.com/article/10 .3390/cimb43030111/s1, Table S1: Contains the list of sequences (with GenBank accession numbers) used for the comparative analysis of selected genes.